© John Wiley & Sons, Inc.
FIGURE 3-1: Distribution of number of private and public airports in 2011 in the population (of 50 states and the District of
Columbia), and four different samples of 20 states from the same population.
As shown in Figure 3-1, when comparing the sample distributions to the distribution of the population
using the histograms, you can see there are differences. Sample 2 looks much more like the population
than Sample 4. However, they are all valid samples in that they were randomly selected from the
population. The samples are an approximation to the true population distribution. In addition, the mean
and standard deviation of the samples are likely close to the mean and standard deviation of the
population, but not equal to it. (For a refresher on mean and standard deviation, see Chapter 9.) These
characteristics of sampling error — where valid samples from the population are almost always
somewhat different than the population — are true of any random sample.
Digging into probability distributions
As described in the preceding section, samples differ from populations because of random
fluctuations. Because these random fluctuations fall into patterns, statisticians can describe
quantitatively how these random fluctuations behave using mathematical equations called probability
distribution functions. Probability distribution functions describe how likely it is that random
fluctuations will exceed any given magnitude. A probability distribution can be represented in several
ways:
As a mathematical equation that calculates the chance that a fluctuation will be of a certain
magnitude. Using calculus, this function can be integrated, which means turned into another related
function that calculates the probability that a fluctuation will be at least as large as a certain
magnitude.
As a graph of the distribution, which looks and works much like a histogram.
As a table of values indicating how likely it is that random fluctuations will exceed a certain
magnitude.
In the following sections, we break down two types of distributions: those that describe fluctuations in
your data, and those that you encounter when performing statistical tests.
Distributions that describe your data